dynamic dataset
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
- (5 more...)
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
- (9 more...)
Recursive Abstractive Processing for Retrieval in Dynamic Datasets
Chucri, Charbel, Azouz, Rami, Ott, Joachim
Recent retrieval-augmented models enhance basic methods by building a hierarchical structure over retrieved text chunks through recursive embedding, clustering, and summarization. The most relevant information is then retrieved from both the original text and generated summaries. However, such approaches face limitations with dynamic datasets, where adding or removing documents over time complicates the updating of hierarchical representations formed through clustering. We propose a new algorithm to efficiently maintain the recursive-abstractive tree structure in dynamic datasets, without compromising performance. Additionally, we introduce a novel post-retrieval method that applies query-focused recursive abstractive processing to substantially improve context quality. Our method overcomes the limitations of other approaches by functioning as a black-box post-retrieval layer compatible with any retrieval algorithm. Both algorithms are validated through extensive experiments on real-world datasets, demonstrating their effectiveness in handling dynamic data and improving retrieval performance.
- North America > Canada (0.04)
- Asia > Middle East > Jordan (0.04)
- Law (0.93)
- Information Technology > Security & Privacy (0.46)
Approximate Nearest Neighbour Search on Dynamic Datasets: An Investigation
Harwood, Ben, Dezfouli, Amir, Chades, Iadine, Sanderson, Conrad
Approximate k-Nearest Neighbour (ANN) methods are often used for mining information and aiding machine learning on large scale high-dimensional datasets. ANN methods typically differ in the index structure used for accelerating searches, resulting in various recall/runtime trade-off points. For applications with static datasets, runtime constraints and dataset properties can be used to empirically select an ANN method with suitable operating characteristics. However, for applications with dynamic datasets, which are subject to frequent online changes (like addition of new samples), there is currently no consensus as to which ANN methods are most suitable. Traditional evaluation approaches do not consider the computational costs of updating the index structure, as well as the rate and size of index updates. To address this, we empirically evaluate 5 popular ANN methods on two main applications (online data collection and online feature learning) while taking into account these considerations. Two dynamic datasets are used, derived from the SIFT1M dataset with 1 million samples and the DEEP1B dataset with 1 billion samples. The results indicate that the often used k-d trees method is not suitable on dynamic datasets as it is slower than a straightforward baseline exhaustive search method. For online data collection, the Hierarchical Navigable Small World Graphs method achieves a consistent speedup over baseline across a wide range of recall rates. For online feature learning, the Scalable Nearest Neighbours method is faster than baseline for recall rates below 75%.
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.90)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.34)
MAHALO: Unifying Offline Reinforcement Learning and Imitation Learning from Observations
Li, Anqi, Boots, Byron, Cheng, Ching-An
We study a new paradigm for sequential decision making, called offline policy learning from observations (PLfO). Offline PLfO aims to learn policies using datasets with substandard qualities: 1) only a subset of trajectories is labeled with rewards, 2) labeled trajectories may not contain actions, 3) labeled trajectories may not be of high quality, and 4) the data may not have full coverage. Such imperfection is common in real-world learning scenarios, and offline PLfO encompasses many existing offline learning setups, including offline imitation learning (IL), offline IL from observations (ILfO), and offline reinforcement learning (RL). In this work, we present a generic approach to offline PLfO, called $\textbf{M}$odality-agnostic $\textbf{A}$dversarial $\textbf{H}$ypothesis $\textbf{A}$daptation for $\textbf{L}$earning from $\textbf{O}$bservations (MAHALO). Built upon the pessimism concept in offline RL, MAHALO optimizes the policy using a performance lower bound that accounts for uncertainty due to the dataset's insufficient coverage. We implement this idea by adversarially training data-consistent critic and reward functions, which forces the learned policy to be robust to data deficiency. We show that MAHALO consistently outperforms or matches specialized algorithms across a variety of offline PLfO tasks in theory and experiments. Our code is available at https://github.com/AnqiLi/mahalo.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Architecture, Dataset and Model-Scale Agnostic Data-free Meta-Learning
Hu, Zixuan, Shen, Li, Wang, Zhenyi, Liu, Tongliang, Yuan, Chun, Tao, Dacheng
The goal of data-free meta-learning is to learn useful prior knowledge from a collection of pre-trained models without accessing their training data. However, existing works only solve the problem in parameter space, which (i) ignore the fruitful data knowledge contained in the pre-trained models; (ii) can not scale to large-scale pre-trained models; (iii) can only meta-learn pre-trained models with the same network architecture. To address those issues, we propose a unified framework, dubbed PURER, which contains: (1) ePisode cUrriculum inveRsion (ECI) during data-free meta training; and (2) invErsion calibRation following inner loop (ICFIL) during meta testing. During meta training, we propose ECI to perform pseudo episode training for learning to adapt fast to new unseen tasks. Specifically, we progressively synthesize a sequence of pseudo episodes by distilling the training data from each pre-trained model. The ECI adaptively increases the difficulty level of pseudo episodes according to the real-time feedback of the meta model. We formulate the optimization process of meta training with ECI as an adversarial form in an end-to-end manner. During meta testing, we further propose a simple plug-and-play supplement-ICFIL-only used during meta testing to narrow the gap between meta training and meta testing task distribution. Extensive experiments in various real-world scenarios show the superior performance of ours.
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > New York (0.04)
- Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
Dynamic Datasets and Market Environments for Financial Reinforcement Learning
Liu, Xiao-Yang, Xia, Ziyi, Yang, Hongyang, Gao, Jiechao, Zha, Daochen, Zhu, Ming, Wang, Christina Dan, Wang, Zhaoran, Guo, Jian
The financial market is a particularly challenging playground for deep reinforcement learning due to its unique feature of dynamic datasets. Building high-quality market environments for training financial reinforcement learning (FinRL) agents is difficult due to major factors such as the low signal-to-noise ratio of financial data, survivorship bias of historical data, and model overfitting. In this paper, we present FinRL-Meta, a data-centric and openly accessible library that processes dynamic datasets from real-world markets into gym-style market environments and has been actively maintained by the AI4Finance community. First, following a DataOps paradigm, we provide hundreds of market environments through an automatic data curation pipeline. Second, we provide homegrown examples and reproduce popular research papers as stepping stones for users to design new trading strategies. We also deploy the library on cloud platforms so that users can visualize their own results and assess the relative performance via community-wise competitions. Third, we provide dozens of Jupyter/Python demos organized into a curriculum and a documentation website to serve the rapidly growing community. The open-source codes for the data curation pipeline are available at https://github.com/AI4Finance-Foundation/FinRL-Meta
- Asia > Middle East > Jordan (0.04)
- North America > United States > Virginia (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Instructional Material (1.00)
- Overview (0.93)
- Research Report > New Finding (0.45)
- Information Technology > Security & Privacy (1.00)
- Banking & Finance > Trading (1.00)
- Information Technology > Services (0.87)
- (3 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Top 8 Cybersecurity Datasets For Your Next Machine Learning Project
Machine learning techniques play a critical role in detecting serious threats in the network. A good dataset helps create robust machine learning systems to address various network security problems, malware attacks, phishing, and host intrusion. For instance, the real-world cybersecurity datasets will help you work in projects like network intrusion detection system, network packet inspection system, etc, using machine learning models. Here is a list of the 8 top cybersecurity datasets you can use for your next machine learning project. About: The ADFA Intrusion Detection Datasets are designed for the evaluation by system call based HIDS.
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (0.83)
Exploiting Points and Lines in Regression Forests for RGB-D Camera Relocalization
Meng, Lili, Tung, Frederick, Little, James J., Valentin, Julien, de Silva, Clarence
Camera relocalization plays a vital role in many computer vision, robotics, augmented reality (VR) and virtual reality (AR) applications. In the real world, camera relocalization has empowered the recent consumer robotics products such as Dyson 360 Eye and iRobot Roomba 980 to know where they have previously visited [1]. In AR/VR products such as Hololens and Oculus Rift, camera relocalization helps to correctly overlay visual objects in an image sequence or real world. Scene Coordinate Regression Forests (SCRF) [2] is the pioneer in using machine learning for camera relocalization. In this method, a regression forest is trained to infer an estimate of each pixel's correspondence to 3D points in the world coordinate. Then these correspondences are used to infer the camera pose with a robust optimization scheme. Since then, various machine learning based methods, mainly random forests based [3], [4], [5], [6], [7], [8], [9] and deep learning based methods [10], [11], [12], [13], [14], [15] have been proposed to accelerate the progress of camera relocalization, in parallel with the traditional but still active featurebased methods [16], [17] and key-frame based methods [18], [19]. In these random forests based methods, either RGB-D/RGB pixel comparison features [2], [3], [5], [7], or the sparse features such as SIFT [8] are employed, without considering the spatial structure.
- North America > United States (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)